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In quantum tomography 1 , a quantum state or 
process is estimated from the results of measure- 
ments on many identically prepared systems. To- 
mography can never identify the state (p) or pro- 
cess {£ ) exactly. Any point estimate is necessar- 
ily "wrong" — at best, it will be close to the true 
state. Making rigorous, reliable statements about 
the system requires region estimates. In this ar- 
ticle, I present a procedure for assigning likeli- 
hood ratio (LR) confidence regions, an elegant 
and powerful generalization of error bars. In par- 
ticular, LR regions are almost optimally powerful 
— i.e., they are as small as possible. 

Quantum information processing relies on quantum 
hardware, including memory qubits and unitary or 
nearly- unitary quantum gates. These individual compo- 
nents must perform their allotted transformations with 
very high precision, especially for fault-tolerant quantum 
computing. The methods used to characterize and vali- 
date quantum devices are known, collectively, as quantum 
tomography. Tomography usually involves repeated in- 
dependent measurements on N identically-prepared sys- 
tems (referred to hereafter as "standard tomography"), 
but can also involve collective measurements on all TV 
copies. Because state and process tomography are math- 
ematically equivalent, this paper will focus on state to- 
mography for the sake of clarity, with the understand- 
ing that all results can be extended straightforwardly to 
processes^. 

Tomography cannot identify p (the state produced by 
a quantum device) exactly, for precisely the same rea- 
son that flipping a coin N times cannot reveal its bias 
exactly. Any point estimate p has precisely zero prob- 
ability of coinciding exactly with the true p, for there 
are infinitely many other states arbitrarily close to p and 
equally consistent with the data. To make a tomographic 
assertion about the device that is true - or at least true 
with high probability - we must report a region of states 
or processes (vis. Fig. [I]). 

Such regions are often constructed by attaching error 
bars to a point estimate. In quantum tomography, this 
approach suffers several drawbacks, some of which are 
illustrated in Fig. [2] Naive error bars define an ellip- 
soidal shape (arbitrary), centered at the point estimate 
(suboptimal), which may include many unphysical states 
(inefficient). Worst of all, it is generally impossible to as- 
sign this ellipsoid any rigorous meaning - e.g., "The true 
state is within it, with probability at least 99%." The 
same problem applies to the other method used to date, 
bootstrapping^* - which means generating a host of simu- 
lated datasets {Dk} (either by resampling the real data, 
or by simulating measurements on a point estimate p), 




FIG. 1: Point estimators, like the maximum likelihood esti- 
mate pmle shown on the left, cannot provide meaningful and 
rigorous statements about the true (but unknown) state p. 
But if we replace point estimators with region estimators, like 
the likelihood-ratio confidence region shown on the right, then 
the region 1Z defines an assertion - u p lies within 1Z with 90% 
certainty" - that is rigorously valid. The estimates shown 
here came from simulated measurements on 60 copies of a 
single-qubit state, with 20 measurements each of a x ,a y ,a z 
yielding +/- counts of 7/13, 9/11, and 3/17 (respectively). 



then reporting the variance of the corresponding point 
estimates {pu}- The underlying problem is that boot- 
strapping and naive error bars both represent standard 
errors - the variance of a point estimator. Unfortunately, 
the point estimators used in quantum tomography are all 
biased, and standard errors for biased estimators do not 
reliably represent uncertainty 14 about the true p. 

Happily, all of these issues can be resolved with a re- 




FIG. 2: General region estimates - adapted to the data, and 
constructed so as to minimize volume - can be far more pow- 
erful, useful, and reliable than traditional "error bars". As 
illustrated here, a valid confidence region need not be: (i) el- 
lipsoidal or rectangular, (ii) centered at a point estimate, or 
(iii) aligned with the axes defined by whatever observables 
were measured. The figure on the right shows a cross-section 
of a 1-qubit LR confidence region, while on the left the small- 
est traditional error bars with the same coverage probability 
are shown. The LR region is noticeably smaller, and includes 
only valid states. Although in this case, the LR region could 
be reasonably approximated by the intersection between the 
error ellipsoid and the Bloch sphere, this is not always the 
case. 



markably simple construction. Likelihood ratio (LR) con- 
fidence regions (see Fig. [3] for some examples) general- 
ize the notion of error bars, providing data-adapted (not 
necessarily ellipsoidal) regions that: 

1. contain the true state with guaranteed, high, and 
user-specified probability; 

2. are, on average, smaller (and thus more powerful) 
than almost any other construction; and 

3. are simple to define and construct. 

Definition 1. Given observed data D, the likelihood 

is a function on states given by C(p) = Pr(D\p). The 
log likelihood ratio is a function on states given by 
X(p) = — 2 log maxp/ £(//)]. Given data D, the 

likelihood ratio region with confidence a is 1Z a (D) = 
{all p such that X(p) < X a }, where X a is a constant (see 
below) that depends on the desired confidence a and the 
Hilbert space dimension d. 

It should be obvious that the threshold value X a plays a 
critical role in this construction. Increasing X a increases 
the size of the LR region 7Z ai which in turn increases 
the estimator's coverage probability - the probability that 
71(D) will contain p - but reduces its power (since large 
regions imply less about p). So X a should be set to the 
smallest value that ensures coverage probability at least 
a. This optimal value depends on a and the size of the 
system we are measuring. It is hard to compute exactly, 
but two upper bounds are provided in Eq. [l4]and Lemma 
[l] Either will guarantee coverage probability at least a, 
at the cost of slightly increasing the regions' size. 

The remainder of this paper attempts to answer three 
natural questions, in order of increasing technicality. 
First, "What is a confidence region, and how does it gen- 
eralize 'error bars'?" Next, "Why is the LR construction 
an especially good one?" Finally, "How do we choose the 
threshold?" The concluding discussion section addresses 
a few other questions, especially the relationship to re- 
lated work by Christ andl and Rennet, and how to use 
and describe LR regions. 



I. REGION ESTIMATORS AND CONFIDENCE 
REGIONS 

Error bars around a point estimate define a region esti- 
mate, but region estimates do not need to be associated 
with a point estimate. After seeing the data (D), we 
can assign a region 7Z(D) of whatever shape and size is 
necessary to achieve our goals. So what are these goals? 

First and foremost is coverage probability. By assigning 
a region, we assert that the unknown p is within it. This 
had better be true with very high probability. Coverage 
probability (a) is the probability that 7Z(D) does indeed 
contain p. It would be very satisfying indeed if we could 
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FIG. 3: Four examples of the shapes that LR regions can take. 
Each example is a 90% confidence region for a qubit, based 
on 60 measurements divided equally among the three Pauli 
observables. Rows (a-d) correspond to four distinct datasets. 
Left and right columns are different views of the same region. 

assert "Given the observed data, the probability that p 
is in 7Z(D) is a," i.e. 

Pr(p G 7Z(D)\D) > a. 

Unfortunately, this assertion^ requires assigning a prior 
probability to u p G 11(D)" . Different agents (e.g., a 
scientist, a skeptical reader, and a funding agency) will 
generally disagree about this prior, and therefore about 
Pr(peti(D)\D). 

So, instead, we make a subtly different assertion: "The 
region 7Z(D) that we assign will contain p with probabil- 
ity at least a," i.e. 

Pr(p G 11(D)) > a. (1) 

This assertion is made before the data D are observed. 
Once we know D, 71(D) is fixed, and the most that we 
can say is "This region was obtained by a procedure that 
'works' almost always (i.e., with probability > a)". Now 
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(after the estimate), a is not the probability of success 
- for success is no longer a random variable. Instead, a 
quantifies our confidence in the estimate, and so an esti- 
mation procedure satisfying this condition is a confidence 
region estimator. 

Confidence is a property of the entire estimator - the 
map from data to regions - rather than of a particu- 
lar region estimate. It depends on the unknown state - 
but, mirabile dictu, we can place relatively tight prior- 
independent bounds on it, 

Pr(peK(D)) > min Pr (p eU(D)\p). 

p 

A confidence region estimator with confidence a satisfies 

Pr(p e n{D)\p) > a V p. 

It's important to understand that confidence regions do 
not provide probabilistic statements about any single run 
of the experiment. Once the data are taken, the estima- 
tor either succeeded or failed, and there is no way to as- 
sign a probability to its success without choosing a prior. 
In any given experiment, what can be said is "We ap- 
plied a technique which is guaranteed to yield a region 
containing the true p at least a of the time - no matter 
what the unknown p is." 

II. OPTIMALITY 

Confidence regions are a basic statistical construct, es- 
pecially for scalar parameters (where they are known as 
confidence intervals). Yet even for 1-dimensional inter- 
vals, there exist many distinct constructions, and no con- 
sensus on the "best" choice. Note that designing a high- 
confidence region estimator isn't hard. For example, the 
estimator 1Z(D) = {all states} has coverage probability 
a = 1. The challenge is to design one that is powerful - 
i.e., assigns small regions (which correspond to powerful 
hypotheses, because they rule out many states). 

The likelihood ratio construction in Definition Q] was 
introduced in 1989 in the context of particle physics by 
Feldman and Cousins 4 , and appears to have been part of 
statistical folklore before that. What has not appeared 
to date is a compelling argument why LR regions are 
particularly good. In this section, I provide such a justi- 
fication, comprising: (1) a proof that the most powerful 
confidence region estimators are probability ratio (PR) 
estimators (a similar result was proven by Evans et al 5 ; 
the treatment here is self-contained), and (2) a heuristic 
argument that, among the family of PR estimators, LR 
estimators have nearly-optimal worst-case behavior. 

First, we need to quantify the power of any given region 
estimator TZ(-). Smaller regions are clearly more power- 
ful, and in this work I will quantify a region's power by 
its volume, 

V(K) = [ dp. (2) 
Jpeiz 



Choosing a particular volume measure dp could be con- 
troversial. Remarkably, the construction in Definition [3] 
is optimal for any measure dpi 

Now, the volume of the assigned region is itself a ran- 
dom variable, depending on the data. Averaging over 
datasets yields an expected volume, 

V(p) = Y J Pr(D\ P )V{n{D)\ 

D 

which is a function of p. Since p is (by definition) un- 
known, we can quantify the estimator's performance ei- 
ther by worst- case (maximum) or average volume. To 
average, we must choose a measure p = P(p)dp over p 
(which need not be related in any way to dp). Optimal 
average performance is achieved by that measure's Bayes 
estimator^. 

Definition 2. Given a cost function V , the Bayes es- 
timator for a given measure p (over states) is the esti- 
mator with the smallest average expected cost w/r.t. p. 

But the Bayes estimator for one measure might have 
very bad performance for another measure. So if we are 
not sure what measure to choose, we have an alternative: 
choose an estimator that minimizes worst-case perfor- 
mance, maxp V(p). This defines the minimax estimato^. 
Happily (and perhaps surprisingly) , these two approaches 
are intimately related by a basic theorem of decision the- 
ory: 

Theorem 1. (Minimax- Bayes duality 67 ) The min- 
imax estimator (for a given cost function) is, under mild 
regularity conditions, also the Bayes estimator for some 
measure p, known as the least favorable prior. 

So to find the minimax estimator - which is appealing 
because it has the best possible guaranteed performance 
- we will focus first on optimizing average performance 

(V) p = J V(p)P(p)dp, 

for an arbitrary measure P(p)dp. (N.B. In spite of this 
notation, the averaging measure P(p)dp and volume mea- 
sure dp are completely independent! For example, Eq. [4] 
doesn't depend on the choice of volume measure dp.) 

A region estimator TZ(-) can be represented by a con- 
nection relation: for each dataset D and state p, we say 
that p is "connected" to D (p ~ D) iff p e 11(D). The 
average expected volume is then given by 

(V) = I V(p)P(p)dp 

= fj2 p r(D\p)V(li(D))P(p)dp 

J D 

= J2Pr(D)V(H(D)), (3) 

D 

where 

Pr(D) = / Pr(D\p)P(p)dp. (4) 
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Now, we recall Eq. [2] for the volume of a region, and 
observe that the integral over p E 7Z(D) is equivalent to 
an integral over p ~ D, which gives 



V) = ^Pr(D) f 

jj J pr^D 



dp 



(5) 



Now, recall that we want to minimize (V) (Eq. |5| subject 
to the constraint that 



Pr(D\p) > a for each p. 



(6) 



We can minimize each of the terms in the integral (Eq. 
[5| independently - because they are not coupled by the 
constraint. To do so, consider a simple cost/benefit anal- 
ysis. Eq. [6] says that each state p must be connected to 
datasets whose total conditional probability Pr(D\p) is 
at least a. But Eq. [5] says that each such connection 
comes at a cost, given by the unconditional probabil- 
ity Pr(D). To achieve total conditional probability a at 
minimum cost, we connect p to datasets in descending 
order of benefit/cost ratio, given by the probability ratio 
statistic 



r(D;p) 



Pr(D\p) 



Pr{D) ' 

down to a threshold r a (p) that satisfies 



E 



Pr(D\p) > a. 



(7) 



(8) 



all D s.t. r(D;p)>r a (p) 



Inverting this relationship (to define which states are con- 
nected to a given D, rather than the other way around) 
yields probability ratio (PR) region estimators: 

Definition 3. The PR region estimator for an averaging 
measure p = P{p)dp, with confidence level a, is defined 
by K(D) = {all p such that Pr(D\p)/Pr(D) > r a (p)}, 
with Pr(D) and r a (p) given by Eqs. 

This prescription is an exact solution to the problem 
of minimum- aver age- volume confidence regions - and, as 
advertised, it does not depend on what measure dp is 
used to defined volume! 

PR estimators are interesting in themselves. In par- 
ticular, the estimator that unconditionally "works best" 
for p = po (for any given p ) is the PR estimator for the 
measure p = S(p — po). An especially confident experi- 
mentalist who believes that her apparatus really is pro- 
ducing po might choose this estimator. Despite the rad- 
ical "prior", this estimator still assigns valid confidence 
regions, on which a skeptical third party can rely. The 
experimentalist's extreme confidence is reflected only in 
this manner: the datasets D that typically occur when 



P = Po yield relatively small regions, while datasets D 
that are improbable given p = p (but might appear 
with high probability for other states) yield enormous 
regions. Thus, if p really is po> then the experimental- 
ist is rewarded with (moderately) small regions. . . but if 
she is wrong and p is very different from po, then the 
assigned region will probably be so large as to imply vir- 
tually nothing about p. 

So while the extreme choice p = S(p — po) is a valid 
one, it's unwise in practice. It does play an important 
role by establishing an absolute (and tight) lower bound 
on V(po). But in practice, any sane experimentalist or 
analyst will choose a more balanced estimator - one that 
performs well even in the worst case. This (as noted 
above) is the minimax estimator, and is the PR estimator 
for the least favorable prior. 

Finding exact LFPs is arduous and tricky at bestP. 
Worse yet, the LFP will depend (perhaps sensitively) on 
the exact volume measure dp. In order to circumvent this 
task (which remains a good challenge for future research), 
let's apply a simple heuristic ansatz to choose p. 

Suppose that we choose some /i, and then p is chosen 
by an adversary so as to maximize V(p). To do so, the 
adversary would look for a dataset Do and a state p such 
that Pr(Do\p) ^> Pr(D Q ) (the latter is determined by 
P(p)dp). If Pr(Do) is relatively small, then its "cost" 
will be relatively low, and many states p' will be con- 
nected to it - which means that 7Z(Do) will be large. 
But because Pr(D Q \p) is relatively large, Do will occur 
relatively often, and V(p) will be large. 

Avoiding this vulnerability is simple: ensure that 
Pr(D\p)/ Pr(D) is not too large for any p. This means 
choosing p so that 



Pr(D) oc max Pr (D | p). 

p 



(9) 



With this choice, the probability ratio statistic (Eq. [7]) 
becomes the likelihood ratio statistic, 



r _ Pr(D\p) | 



Pr(D\p) 



Pr{D) max p , Pr(D\p>) £ max (p) 



A, (10) 



which defines likelihood-ratio (LR) regions (Definition [T]) 
as a special case of PR regions. We use A = — 2 log A, 
rather than A, to maintain a convenient connection with 
[extensive] previous work on likelihood ratios. 



III. THE THRESHOLD 

For a generic PR region estimator, the threshold value 
of the statistic (r a ) depends significantly on p. We could 
define LR regions in the same way, using a p-dependent 
threshold X a (p)- But one of the special attributes of 
the LR statistic is that, unlike generic PR statistics, its 
distribution is approximately independent of p (as shown 
in Fig. |4|. Proving this independence exactly depends on 
a Gaussian approximation that does not quite hold even 
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FIG. 4: This figure shows the tight state- dependent cutoff 
X a (p) for a particularly simple case: a qubit measured only 
in the a z basis (or, alternatively, a classical coin). The cutoff 
depends only on (a z ), so it can be plotted easily and com- 
pared to (i) the x 2 value, around which it fluctuates, and (ii) 
the upper bound given by Eq. [14] While using the state- 
dependent cutoff would yield (slightly) smaller regions, there 
is a concomitant loss of simplicity, elegance, and convenience 
(because the regions become nonconvex). 



as N — >• oo, but it does hold approximately. So while 
exactly optimal regions require calculating a p-dependent 
threshold numerically (see Fig. [4|, we can replace X a (p) 
with a constant lower bound X a satisfying 
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FIG. 5: Three different bounds on the complementary cumu- 
lative distribution function or CCDF [i^(A a )] for the loglike- 
lihood ratio derived from qubit state tomography (wherein p 
has K — 3 degrees of freedom). The horizontal axis is the 
cutoff value X a , while the vertical (log-scale) axis is the asso- 
ciated failure probability 1 — a = F(X a ). "Data" correspond 
to an exhaustive numerical calculation of t he C CDF, and are 
compared with the x 2 approximation (Eq. 12) and the upper 
bound given in Eq. [14] The plot labeled "Numeric bound" 
is a hybrid calculation where method used to derive Eq. [14] is 
augmented by calculating one hard-to-approximate quantity 
numerically; its excellent agreement with data supports Eq. 
[l4| and suggests that it can be improved. 



X a > X a (p) for all p 



(ii) 



and obtain a simpler and more elegant estimator that is 
still rigorously correct, and only sacrifices a little power. 
This simplifies matters - but we still need to set X a \ 

Coverage probability and region size both increase with 
X a . So we want to set it as low as possible (to ensure 
powerful regions) while maintaining coverage probability 
at least a. 7Z a will include p if and only if X(p) < X a . 
In principle, we can compute the distribution of X(p) for 
every possible p, define a complementary cumulative dis- 
tribution function (CCDF) 

F(X a \p)=Pr(X(p)>X a \p), 

and then solve the equation max p F(X a \p) = 1 — a for X a . 
In practice, computing F is hard, so instead we use upper 
bounds on F (as shown in Fig. [5| to set A a , ensuring 
coverage probability > a at a small cost in power. Two 
valid and useful (though not tight) bounds are given in 
Eq. [14] (whose derivation is rather arduous, and will be 
published elsewhere) and Lemma [I] (proven herein) . 

If the data had a Gaussian distribution, then X(p) 
would be a xt random variable, where k (the number 
of degrees of freedom) is the number of linearly indepen- 
dent observables measured, and is equal to d 2 — 1 for an 
informationally complete set of measurements. The cor- 
responding CCDF is independent of the true state p, and 
given in terms of an upper incomplete Gamma function 



by 



7 (fc/2,A a /2) 
T(k/2) 

1 



(1-1)! V2 



fc/2-1 



-A«/2 



(12) 



(13) 



where the second line is valid as X a — >• 00. Unfortunately, 
tomographic data are multinomial, not Gaussian, and 
this ansatz is too optimistic. Using it yields a coverage 
probability that is a only on average, and can be much 
lower for some p. A much more arduous calculation (to 
be published separately) yields an upper bound 



F Y 2 + e 



-A a /2 



— k-1 

kyXa when X a — > 00, (14) 



that is valid whenever the data D are obtained from inde- 
pendent measurements on identically prepared copies of 
p (the standard tomographic setup). Figure [5] compares 
these bounds with an exhaustive numerical calculation of 
F(X a ) for the k = 3 degrees of freedom found in single- 
qubit tomography. 

A simpler (but looser) bound that applies to any data, 
including joint measurements^, is 

Lemma 1. For any measurement on N copies of a d- 
dimensional system, F(X a ) < N d _1 e" 



-Ac/2 
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Proof. A POVM measurement M = {E k } (where E k > 
and E k = 11) is performed on the state p 0Ar , on the 
TV-copy Hilbert space 71® N . By Schur's Lemma, we may 
decompose the Hilbert space as %® N = ^^]j ly <^p vi 
where U v and V v are irreducible representation spaces 
of the unitary group SU (d) and the permutation group 
Sn (respectively). Because the state is permutation- 
symmetric, it is maximally mixed on the V v factors, and 
therefore this measurement is equivalent to a measure- 
ment M* = {E' k } on the much smaller Hilbert space 
® v U Vl whose dimension is at most M = N d2 ~ l . 

Next, let the probability of event k given p be p kl and 
the maximum probability of event k be q kl so that A = 
-2\og(jp k /q k ). Rewriting this gives p k = q k e x/2 . Now, 
q k < Tr(E' k ), and £ fc E' k = 11, so E fc Qk < M. The prob- 
ability that A > A a is Pr(X > X a ) = J2 a \\ k s .t. \>\ a Pk- 
Each term is p k = q k e~ x l 2 < q k e~ Xa / 2 , so the sum is 
upper bounded by Me~ Aa/2 = ]yd 2 -i e -x a /2 _ a 

Equation [14] does not grow with the number of sam- 
ples measured (A/ - ), so it will generally be much tighter 
than Lemma [I]- but is harder to derive. And in many 
cases Lemmajljmay be fine; because F(X a ) decreases ex- 
ponentially in all cases, the N d _1 factor will enlarge the 
region's size by at most polylog(N). 

Since these bounds are loose, their use will produce 
confidence regions that are somewhat larger than neces- 
sary. We will want to know how excessively large they 
are! Fortunately, such a tool is ready to hand. The x 2 



approximation to F(X a ), given in Eq. 12, gives a lower 
bouncP^on \ a This suggests a simple test: 

1. Define a confidence region 7Z(D) using a value of 
X a obtained by solving Eq. [14] or the equation in 
Lemma [TJ 

2. Define an "inner bound" region 7Z m [ n (D) using X a 
obtained by solving Eq. [12] 

3. If 7Z(D) and 7Z m i n (D) are relatively similar (i.e., 
71(D) is not much bigger, and they lead to simi- 
lar conclusions about the experiment) then there's 
nothing to be gained from using a tighter bound. 



IV. DISCUSSION 

Likelihood-ratio confidence regions define "error bars" 
for quantum tomography that are: 

1. Rigorously guaranteed to capture the true state (or 
process) with controllable probability a, 

2. Approximately as small as can be achieved (within 
that constraint), 

3. Natural, convenient, and intuitive. 

The third point summarizes several particularly nice 
properties of LR regions. They are convex for standard 



tomographic data (because the log likelihood is convex), 
so they can be manipulated and characterized using con- 
vex programming. Determining whether a given state p 
lies in 7Z(D) is even easier - just compare C(p) to £ max - 
And LR regions are, in a sense, a natural generalization of 
the popular maximum-likelihood (ML) point estimator. 
This view is actually backward; whereas ML point esti- 
mators have no finite-sample optimality properties, and 
may yield pathological results for quantum tomography, 
LR regions form a provably near-optimal region estima- 
tor. So a more accurate view is that the near-optimality 
of LR regions explains why ML point estimators often 
work well: the true state is usually in 7l a (D), and 7Z a (D) 
is usually a neighborhood of Pmle- 

Unlike the ellipsoidal regions implied by error bars, or 
the spherical ones implied by large deviation bounds^, 
LR regions have shapes that are variable and data- 
adapted (see Fig. [3|. This is a virtue - they can be 
much smaller than the best region of a fixed shape. But 
it can also be inconvenient. Many questions about p can 
be answered directly using the simple implicit descrip- 
tion of 71(D) and convex programming (e.g. "Is p defi- 
nitely separable?" or "What values of (X) can be ruled 
out?"). Sometimes, though, an explicit description of 
7Z(D) is required. One can be produced with reasonable 
efficiency by sampling from the surface and calculatin g a 
minimum- volume bounding ellipsoid 10 or hyper sphered 
This trades power (the approximated region is larger) for 
convenience. 

Most of the error bars or regions used to date in quan- 
tum tomography are based on standard errors - i.e., the 
variance of a point estimator, usually the maximum like- 
lihood estimator. As discussed in the introduction, such 
regions are not reliable (they may work in many cases, 
but not all!). However, a rigorously reliable solution was 
proposecP quite recently by Christandl and Renner. This 
excellent result deserves some discussion here. Although 
it was obtained entirely independently of the current 
workP^, it addresses a very similar problem and arrives 
at a solution that (in some ways) is closely related. . . via 
remarkably different methods! 

Their main result is that confidence regions with confi- 
dence a can be constructed by (i) constructing a Bayesian 
credible region (a region containing a' ~ 1 of the poste- 
rior probability) for a particular prior (Hilbert-Schmidt 
measure over quantum states), then (ii) enlarging the re- 
gion slightly in a particular way. This proof elegantly 
synthesizes Bayesian and frequentist notions, by con- 
structing regions that are simultaneously confidence re- 
gions and credible regions (at least for a particular prior) . 
It suggests that the two approaches - while philosophi- 
cally orthogonal - are deeply related in some fashion. 

On the other hand, the method in RefP is not ex- 
plicit. That is, it does not suggest which high-posterior- 
probability region to report. The natural choice is to 
choose the smallest such region, but it is not immedi- 
ately obvious how to identify it. Moreoever, there is no 
obvious way to determine how powerful this procedure is 
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- i.e., whether it assigns regions that are, in some sense, 
smaller than those assigned by most or all other estima- 
tors. 

These are exactly the strengths of the likelihood-ratio 
method given here. Definition [T] defines a specific, simple, 
and straightforward protocol, and we can easily analyze 
the expected volume of LR regions and show that they 
are optimal in a strong sense. The resulting regions are 
also relatively easy to characterize using known proper- 
ties of the likelihood function (e.g., for standard tomo- 
graphic data it is convex, and so are the LR regions). 
On balance, the LR method given here seems more prac- 
tically useful at the present time. However, it's worth 
noting that the posterior distribution used to define re- 
gions in RefP is very closely related to the likelihood 
C(p) (thanks to Bayes' Rule). So, under most circum- 



stances, regions assigned through a sensible interpreta- 
tion of Ref. 3 will be quite similar to LR regions (albeit 
perhaps a bit larger). This suggest that further research 
may synthesize both approaches into a single, uniquely 
satisfying definition of "error bars". 
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This is done by identifying a quantum process 8 with a 
bipartite state p(£) = (11 ® 8) [|^)(^|], where |^) is a max- 
imally entangled state. The differences between state and 
process tomography are entirely in implementation, and 
they all pertain to gathering data, not to its analysis. 
To see this, consider a simple biased point estimator that 
assigns p — no matter what data is observed. Clearly, 
the variance of p(D) is zero - and, just as clearly, this 
implies nothing about our uncertainty about the true p\ 
While obviously extreme and even absurd, this estimator 
nonetheless demonstrates that the variance of biased esti- 
mators cannot be relied upon to describe uncertainty. In 
real- world tomography, the maximum likelihood estimator 
Pmle is biased by the positivity constraint p > 0, and this 
can lead to significant underestimation of uncertainty. 
This condition defines a Bayesian credible region. 
This bound was inspired by Refs. 3 12 , and by discussions 
with Matthias Christandl, for wh ose help I am grateful! 
It's worth being precise here. Eq. 12 is not a lower bound 



on FQ. Instead, the true exact FQ depends on p, and 
fluctuates above and below the % 2 approximation, as shown 
in Fig. [4] Since FQ is greater than the x 2 approximation 
for some p, any state-independent upper bound on FQ will 
be strictly greater than the x 2 approximation, so Eq. 12 is 
a lower bound on such upper bounds. 
18 The authors of RefP and I became aware of each others' 
work only when our simultaneous submissions to the QIP 
2012 conference were accepted and merged by the program 
committee. The present paper is almost independent of our 
subsequent collaboration; the exception is LemmaJI] which 
was proved by myself but directly inspired by conversations 
with Matthias Christandl at ETH-Zurich. 



